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Abstract 

Mottvated by the information bound for the asymptottc variance of M-estimates for scale, we de- 
fine Fisher information of scale of any distribution function F on the real line as the supremum of 
all (jx<j>'(x)F(dx)\ J J <j) 2 (x)F(dx), where ranges over the continuously differentiable func- 
tions with derivative of compact support and where, by convention, 0/0 := 0. In addition, we 
enforce equivariance by a scale factor. Fisher information of scale is weakly lower semicon- 
tinuous and convex. It is finite iff the usual assumptions on densities hold, under which Fisher 
information of scale is classically defined, and then both classical and our notions agree. Fisher 
information of scale finite is also equivalent to Li -differentiability and local asymptotic normal- 
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ity, respectively, of the scale model induced by F. 
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1. Motivation and Definition 

If F is any distribution function on K, the real line, and (j) : R^Ra suitable scores function 
ly-j . such that / (j) dF = 0, an M-estimate of scale S„ may formally be defined by 

I* f )=0. (1.1) 

The estimand refers to the scale model (F a )o <a<0 o induced by F =F\, where F a (x) = F(x/o). 
Taylor expanding (x/s) = (x/o) — (s — <j)^)'(x/o)x/a 2 -\ , we formally obtain 

n l Y!{V{xi/o)xi/o 

such that under observations xi,...,x„ i.i.d.^ F a and assuming sufficient regularity, in particular 
consistency, %/n (S„ — a) will as n — > °° be asymptotically normal with mean zero and variance 



V^.F a) = a 2 V^.F). r l( 0.F):=_i<_^£^l_. (1.3) 
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(jx<j>'(x)F(dx)y 
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If is differentiable with continuous derivative of compact support, both <j>(x) and x<j>'(x) are 
bounded, so the integrals in ( 11.3b are well-define d for any dis tribution F on the Borel (7-algebra B 



of R. As in the theory of generalized functions (Rudinj ( 119911 Ch. 6)), regularity conditions are 
shifted to the test functions whenever possible. 

The usual information bound for asymptotic variance would say that V(<p,F a ) > J^ ! T l (F a ) 
and, hopefully, the lower bound will also be achieved. 

This leads us to the following definition of ,f & \{F). The extension to J^ s (F a ) for the scale 
transforms F a of F matches (11.31 ). 

Definition 1.1. Fisher information of scale, for any distribution F on the real line, is defined by 

„ , s (jx<j>'{x)F(dx)) 2 

Ai{F):= sup VJ J > J , 1.4) 

where ^ c \ denotes the set of all differentiable functions <j) : R — > R whose derivative is continuous 
and of compact support, and 0/0 := by convention. For the scale transforms F a ofF we define 

>.(F ff ):=a _ V.i(F), 0<a<~. (1.5) 

Remark 1.2. Since the map n- CT , where <p a (x) := <j>(ox) and (j)' a (x) — (?<j)'(<Jx), defines a 
one-to-one correspondence on ^i, we obtain scale invariance of J^i, 

S t i(Fa) = S.i(F), 0<a<-. (1.6) 

So extension ( 11.51 l is needed to obtain scale equivariance. In the scale model, as opposed to 
location, it matters whether a given distribution F is considered element F = F\ or, for example, 
element F — F$ (in the scale model generated by F2). D 

Motivated by the information bound, Definition II. H is instrinsically statistical. It does not a 
priori use the assumption of, and suitable conditions on, densities. These properties rather follow 
from the definition in case J? s is finite. Another advantage is that Definition 11.11 implies certain 
topological properties (con vexity and low er continuity) of J? s . 



The definition parallels Huber (1981, Def . 4. 1 ) in the location case, 



4tl s (j$'(x)F{dx)) 2 

A(F :=sup U ; 2 y V ; ( , (1.7) 

I J <P < l {x)F(dx) 

where 0, subject to / <j) 2 dF > 0, ranges over the (smaller) set ^. of all continuously differen- 
tia ble functions w hich themselves are of compact support. <f\ is shift invariant. 

Huberldl98U p. 79), states vague lower semicontinuity and convexity of *f\. Bv lHuberl(11981 



Thm. 4.2), l?\{F) is finite iff F is absolutely continuous with an absolutely continuous density / 
such that /'// e Li(F), in which case J\(F) = J(f'/f) 2 dF. 

Remark 1.3. The latter result, by arguments of the proof to Theorem 2.2 below, still obtains if 
definition dl.7l ) is based on ^.1. Only vague lower semicontinuity of J>\ would be weakened to 
weak continuity (which, however, makes no difference in the setup of normed measures). The 
convention 0/0 := could replace the side condition <p ^0 a.e. F in dl.7b as well. 

The non-suitability of < ^ c 1 , and suitability of c £ z \ instead, is the tribute to the scale model, 
for which the functions x 1— > x<j)'(x) need to be dense in Li(Fq) with respect to the punctuated 
(substochastic) measure Fq introduced in ( 12. Il l below. □ 
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Fisher information of scale has been treated bv lHuberl (1196411 198 ll) not in the previous gener- 
ality but only under suitable assumptions on densities and, in an auxi l iary w ay, has been reduced 
to the location case by symmetrization and the log-transform. lHuberl(1198U Sec. 5.6). 



2. Main Results 

Proposition 2.1. ^ % \ is weakly lower semicontinuous and convex. 

Zero observations do not contain any information about scale. Removing the mass of any 
distribution F at zero, we define the punctuated, possibly substochastic measure Fq by 



F Q :=F-F({0})1 , 



(2.1) 



where lo denotes Dirac measure at 0. In terms of distribution functions, denoting by lp.oo) the 
indicator function, we have Fq(x) =F(x) — (F(Q) — F(0— ))l[o,<»)(*)- 

Theorem 2.2. For any distribution F on the real line, ^ s \ (F) is finite iff 

i) Fq is absolutely continuous with a density f such that 
ii) x H ¥ xf(x) is absolutely continuous, and 

Hi) x h> A{x) := ~[xf{x)]'/f{x) G L 2 (F Q ), 

in which case ,J? S i (F) = f A 2 dF = [ [l+xf (x) lf(x)] 2 F (dx) . 

JxjtO 



3. Consequences for the Scale Model 

For the scale transforms F a of F, J A (F a ) = J A (F) and J? s (F a ) = a^J^x (F) by Ol and ( fT31 l. 
respectively. In particular, J^si(F a ) and J^ s (F a ) are finite iff ,f & \(F) is finite. Also conditions 
i) and ii) of Theorem 12.21 are simultaneously fulfilled for a density / of Fq and the density 
fa(x) — <3~ l f(x/o) of the punctuation F a Q of F a . In the finite case, since [xf a (x)]' /fa(x) in 
condition iii) of Theorem l2.2l is just A(x/a), this theorem yields J? s i (F a ) = J A 2 (x/o)F a fi(dx), 
which is J A 2 (x) Fo(dx) = J^i (F a ); that is, (11.61 ) again. Therefore, in the finite case, 



(3.1) 



A(Fe) = J A z a dF afl , 0<(7<-. 
the representation of J? % (F a ) in terms of the usual score function A^, 



ActW:= I a( i : 



^log/,M = -i(l, (y J{ _ } 



(3.2) 



As an analogue to a lemma due to H aiekl (119721) in the location case. ISwensenl i 1 980J Ch.2, 
Sec. 3) for an absolutely continuous F has shown that conditions i) — iii) of Theorem 12.21 even 
imply L2-differentiability (Rieder, 1994, Def. 2.3.6) of the scale model, 



\JdF a+t - y/dF a (l + jtA a )\\ = o(f) asf^O 



(3.3) 



at a = 1 and, by invariance, at any < <7 < °°. By definition, L 2 -differentiability already entails 
that J A 2 , dF a < °°. Setting A(0) := 0, we may extend his result to ^({0}) > 0. 
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Proposition 3.1. Assume that J? s i (F) < °° . Then the scale model {F a )o <a<00 is Li-differentiable 
with derivative A a at every < (7 < °° . 

L2-differentiability of a parametric model implies an expansion of the log-likelihhods, see 
e.g. lRiederl (1994, Thm. 2.3.5); in our case, for each h £ R, 

that is, the scale model is locally asymptotically normal (LAN). LAN is the basis of asymptotic 
optimality results as H ajek's A s y mptotic Convolution Theorem and the Loca l Asym ptotic Min- 






imax T heorem, see e.g . lRiederl(11994 Thm.'s 3.2.3, 3.3.8) and Ivan der Vaartl dl998, Thm.'s 8.8, 
8.11). Le Cam (1986, 17.3 Prop. 2) even shows that, in the i.i.d. setup, LAN is equivalent to 
L2 -differentiability. Thus we obtain the following result. 

Proposition 3.2. The following statements are equivalent: 

i) J^ s (F a ) < 00 at some < a < °°. 

ii) The scale model is Li-differentiable at some < (7 < °°. 
Hi) The scale model has the LAN property (13.41 > at some < (7 < 00. 

By invariance, the validity of each statement at one a implies its validity at any other < a < °°. 

Appendix A. Proofs and Absolute Continuity 

Proof of Proposition 2.1 The sup over a family of l.s.c, resp. convex, functions being l.s.c, resp. 
convex, it suffices to show that, for each 6 ^ c l> the reciprocal function Vf (0, ■) from dl.3t . is weakly 
l.s.c. and convex. In this proof only, we pay a price for the simplifying convention 0/0 := 0. 

Let F„ — » F weakly. Then / dF„ — > / <j) dF . First assume f (j> dF > 0. Then / (j> dF„ > eventually, 
and VF l {$,F n ) ->• Vf l {<j>,F). Secondly suppose that J(j) 2 dF = 0. If also Jx(f>'dF = 0, then Vj~ l ($,F) = 

< Vf ' 0,F„) for all n. If Jx(j>'dF ^ 0, then J(j) 2 dF n -> 0, Jx<j)'dF„ -» Jx(j>'dF ^ 0, hence Vf ' (0,F„) 
tendsto<» = yf 1 ((|»,F). 

Given F 1; F 2 , .s £ (0,1), put F = (l-j)Fi +jF 2 . In case both J<j> 2 dFj > 0, we get Vf l {<j>,F) < 
(l-s)Vf 1 {<j>,F 1 )+sVf 1 (<j),F2) from lHubeJ Jl98lL Lemma 4.4). Secondly, let Jtj> 2 dFi = < J<j> 2 dF 2 . 
Then, if Jx<$)'dF\ =0, hence Vr l {$,F{) = 0, and Vf '(0,F) = sVf V*^) = (l-*)O + jVf 1 (0,F 2 ). If 
Jx(j)'dFi ^ 0, Vf 1 (^>,Fi) = 00 and (1 -5) oo + jVf H^,-^) > Vf '(0,F). Thirdly, let both /0 2 dF,- be 
zero. Then, if also both fx(j>' dFj = 0, we get Vf (<j),F) = 0. At least one Jxlfi' dFj nonzero implies that 
(l-*)Vf 1 (^Fi)+*Vi- 1 (^^)=~. D 

Lemma A.l. For any finite measure F on B, tiie class ^ c \ is dense in Lq_(F), If F({0}) = 0, the related 
class &> c i := {x n> x<j)'(x) \ (j) £ ^.j } is dense in Lz(F). There exist functions < (j)„ < 1 in ^.j such that 
sup /1A . |x<^(x)| < 00, lim„x<^(x) = 0, and tp„(x) f 1, respectively <j) n (x) 4- l{ t =o} pointwise. 

Proof On the basis of Lusin's theorem. iRudinl dl974l . Thm. 3.14), it suffices to approximate the indicator 
of bounded intervals (a,b\. 

For e 4. one may choose functions g £ S ^ c l such that < g e < 1, g e = 1 on [a + £,b], g £ =0 on 
(— 00, a] U [Z? + e,oo). Then g £ — >• l/ a w pointwise, and g £ — ► i-i a t>] in7<j(F) by dominated convergence. 

Concerning denseness of & c \ in L] (Fq), we may assume that a > 0. Drawing on the functions g E define 
h £ (x) := Jl 0< ,y^ 1 g £ (y)dy. Then h £ g ^f cl and, as before, xfr' e = g e -> l( fl fc ] in Li(Fo). 

A possible choice of the functions (j)„ , in the first case, is <j) n (x) = (j) (x/n ) , based on the function 2 (x) = 

1 + cos (Qx\ — 7t)+ A 71) , and, in the second case, <f>„ (x) = <j) (nx) , where 2 (j) (x) = 1 + cos ( |x| A 7r) . D 
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Absolute Continuity From real analysis, e.g., Rudin ( 1974, Ch.8), we recall: An R-valued measure on 
the Borel cr-field B of the real line is dominated by A, the Lebesgue measure, iff its distribution function 
is absolutely continuous. A function / : R — > R is absolutely continuous, if for any £ > there is a 5 > 
such that for any finite collection of disjoint segments (a,,fc,] of total length A (U( a />^;]) < 5 it holds that 
Li \f{bi) — f( a i)\ < e - Any absolutely continuous / has bounded variation on compact intervals [a,b], 
the derivative /' exists a.e. A, and f(b)-f(a) = /* / ' dX where /* \f\dX < «. Integrability /' e Z-i(A), 
implying bounded variation on R, and the limit /(a) — ^ as a — !• — <*> require further conditions, respectively. 
These are obviously satisfied in the location case for absolutely continuous densities / such that ,#\(F) < °° 
for dF = fdX, hence in particular / 1/'| dX < °°. If/ and g are absolutely continuous, so is their product fg 
on any compa ct \a,b\. Thus , integration by parts holds: f(b)g(b) — f(a)g(a) = f f'gdX + f fg'dX — a 
special case of Rieder ( 1994, Lemma C.2. 1). 

Proof of Theorem 2.2 First assume ,# S \(F) <°°. On^i define T((j>) := — Jx(j)'dF, which operator is 
well defined, because / dF = 0, in view ofDefinition ll.il entails that Jx(j>'dF = 0. 

Evaluated on ^ a \, T has operator norm y/J? s i{F). ^ c \ being dense in Li(F), T may be extended 
to Li{F) keeping its norm. By Riesz-Frechet there exists some g £ ^(F), whose norm equals the operator 
norm of T, such that T(<j>) = J <p gdF for all (j) £ Li(F), hence 



c0'dF= <t>gdF, (j>e^cl- (A.l) 

Inserting <j>„ from Lemma IaTI both choices, we obtain that, in addition to / g 1 dF = ^i(F), 

*gdF = 0, g(0)F({0})=0 (A.2) 



/< 



In particular, the integrals in J A. U and dA.2t may be restricted to R \ {0} . Define the function 

1 



/«:=-/ g(y)Fo(dy), x^O. 

X Jy<x 



(A3) 



Then, if 0_oo denotes the constant value of £ ^ c \ left to the support of <j>' , J <j)gdF = f(0 — <p-oo)gdFQ 
and <j)(x) — 0-oo = Jojt y < x $' (y) X(dy) . Due to compact support of (j)' , andg£ Lq(FQ), the product g(x)<j)'(y) 
is in L l (F (dx)<»X(dy)l and so Jx<p'dF = - ff x> ^ g(x)$>(y)F Q (dx)X(dy) = fyf(y)$'(y)X(dy) by 
Fubini; thus, 

[xtj>'(x)F {dx)= fx(j>'{x)f(x)X(dx), tj> £ Sf c l • (A.4) 

By denseness of S* c i in Li (Fg), Lemma |A, II the LHS determines Fo. As pointwise and dominated conver- 
gence xh' £ = g £ — > l/ a u has been established in that proof, also fdX on the RHS is completely determined 

by (A3) if /rfA is finite on any compact in R\{0}. But f*\f\dk < A~ l /* \xf(x) \ X(dx), which is 
bounded by (B/A — 1) / |g| <ifo < °° for A > 0, and likewise for B < 0. Thus we conclude from iAA\ that 



dF {) =fdX. (A.5) 

Since Fy is nonnegative, in fact / > a.e. A. Absolute continuity of the function ra, 

"»(*):=/ g(y)F () (dy)= f g(y)f(y)X(dy). (A.6) 

Jy<x Jy<x 

follows from J \g\fdX = J \g\ dFo < °°. As m(x) = xf(x) for x / 0, differentiability of / a.e. A (for x 7^ 0) 
is entailed by that of m, and 

g(x) = l+xf(x)/f(x) a.e.F (<fa). (A.7) 

This completes the identification of g under F, and i)-iii) are proved. 

Conversely, assume i)-iii). By ii), m(x) = xf(x) is absolutely continuous. Differentiability of m at x 7^ 
implies that of/, and ra' = f + xf . For A-densities, necessarily A(/ = 0,/' ^ 0) = 0, hence also X(f = 
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0, m' J= 0) = 0. With -A = m'/f = 1 +xf/f a.e. F , we have / \m'\ dl = J\A\ dF {) < °o by iii). Thus, m 
and its measure m! dX = — A dFo are of bounded variation on R. 

By Holder inequality, \m{y)— m(x)\ 2 < \F(y)— F(x)\ f A 2 dFo, som(x) for x — > oo is a Cauchy sequence. 
But lim.v^oc m(x) must be zero since otherwise /(x) ~ 1/x for x — > o° would not integrate. The same holding 
for x — Y — oo, we obtain 

'm r dX=0. (A.8) 



For £ ^cl, the function — 0_„, and c orresponding m easure 0'dA have bounded variation on R. Thus 
integration by parts in the general form of lRieden i 1994 , Lem. C.2. 1) yields / fy'mdX = — J (j> m! dX, such 
that 

fxtp'dF = [<j)'mdX = - [<t>?n'dX = [<j>AdF . (A.9) 

Applying Cauchy-Schwarz, we get 

(fxQ'dF) =([<!> AdF Q ) < f (j) 2 dF I A 2 dF , (A.10) 

where /A 2 dFo is finite by iii). It follows that ,_f s \ (F) < °°. □ 

Proof of Proposition 3.1 We decompose \\^/dF a+t — \[3Fq ( 1 + jtA a ) || into the following sum, 

|| (v^F^,- y/dFa{\ + \tA a )) l {0 y\\ + \\ {y/dF^T, - v / rfF^(l + i/A ff )) 1 {0} || , (A.ll) 

The first summand is o(f ) bv lSwensenl dl980l> . The second is 0, since F a ({0}) = F({0}) and A a (0) = 0. D 
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